Asynchronous Distributed Gibbs Sampling (Preprint Version 0.1)
نویسندگان
چکیده
Gibbs sampling is a Markov Chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Since MCMC methods typically suffer from poor scaling when the integral in question is high-dimensional (for example, in problems in Bayesian statistics involving large data sets), researchers have attempted to find ways to speed up computation. We present a novel scheme that allows us to approximate any integral (for which a Gibbs sampler exists) in a parallel fashion with no synchronization or locking, avoiding the typical performance bottlenecks of parallel algorithms. We provide three examples that offer numerical evidence of the scheme’s convergence and illustrate some of the algorithm’s properties with respect to scaling. Because our hardware resources are bounded, we have not yet found a limit to the algorithm’s scaling, and thus its true capabilities remain unknown. The convergence proof for our scheme is a work in progress and we defer it to a future publication.
منابع مشابه
Distributed Matrix Factorization using Asynchrounous Communication
Using the matrix factorization technique in machine learning is very common mainly in areas like recommender systems. Despite its high prediction accuracy and its ability to avoid over-fitting of the data, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of the prohibitive cost. In this paper, we propose a distributed high-per...
متن کاملTechniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods
Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling are finding widespread use in applied statistics and machine learning. These often require significant computational power, and are increasingly being deployed on parallel and distributed systems such as compute clusters. Recent work has proposed running iterative algorithms such as gradient descent and MCMC in parallel asynchronous...
متن کاملAsynchronous Distributed Learning of Topic Models
Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors indepen...
متن کاملEnsuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling
Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions. To speed up Gibbs sampling, there has recently been interest in parallelizing it by executing asynchronously. While empirical results suggest that many models can be efficiently sampled asynchronously, traditional Markov chain analysis does not apply to the asynchronous case, and thus asy...
متن کاملAsynchronous Distributed Estimation of Topic Models for Document Analysis
Given the prevalence of large data sets and the availability of inexpensive parallel computing hardware, there is significant motivation to explore distributed implementations of statistical learning algorithms. In this paper, we present a distributed learning framework for Latent Dirichlet Allocation (LDA), a well-known Bayesian latent variable model for sparse matrices of count data. In the p...
متن کامل